fix(security): close HTTP receive timing side-channel (#35)#43
Conversation
The #20 fix unified every HTTP receive failure to a bodyless 400, but response TIME still partitioned them: a held recipient kid runs full ECDH/AEAD (~360 us) while an unheld kid fast-fails before any crypto (~180 us), so one timed request enumerated which recipient keys the agent holds (the residual L-025 filed). Add a constant-time floor on the 400-rejection path: - New DidCommReceiveOptions.ReceiveRejectionFloor (default 5 ms; TimeSpan.Zero disables). Both MapDidCommEndpoint HTTP overloads pad the 400 with a non-cancelable, timer-backed Task.Delay up to the floor, so unheld-fast-fail and held-AEAD-fail pad to the same time. - 202 success and 415/413 are never padded (cost is paid only on rejected traffic). WebSocket is document-only (nothing written back). Adversarial pass (repo rule) found two issues: - Finding 1 (fixed): the floor clock was started at handler entry, before the <=1 MiB body read; a peer could pad the envelope to evict the floor. Moved the timestamp to after the (kid-independent) body read; added Large_body_rejection_is_still_padded_to_the_floor. - Finding 2 (documented residual): the held-only decrypt-then-network- DID-resolution path can exceed any fixed floor; unbounded network latency means a fixed floor cannot close it. Documented in the API docs + CHANGELOG; mitigation is auth / a rate-limiter. Root cause is upstream (JweParser.Parse/FindPresent fast-fails before ECDH) -> filed dataproofs-dotnet#12 + tracking #42. Tests: +6 (3 unit pad-helper, rejection-floored, large-body-floored, success-not-padded). Full suite green (656 tests). Builds clean under warnings-as-errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
The fix is well-reasoned. The adversarial pass caught a genuine flaw (Finding 1) before merge, the non-cancelable Task.Delay design is correct, and tests correctly use lower bounds on padded paths (can't flake upward on slow CI). Specific issues below, sorted by severity.
HIGH — Finding 2 needs a dedicated tracking issue
The PR correctly documents that the authcrypt+slow-resolver path (decrypt → network DID resolution → attacker-controlled did:webvh) is an unbounded timing residual a fixed floor cannot close: response ≫ floor ⇒ held. That disclosure is honest and correct. However, the mitigation guidance ("auth / a rate-limiter") is buried in the ReceiveRejectionFloor XML doc. A dedicated GitHub issue (separate from #42, which tracks the upstream JWE fix) should be opened so this residual has its own lifecycle. As written, an operator searching issues won't find it.
MEDIUM — Windows Task.Delay timer granularity
Task.Delay on Windows has ~15 ms timer resolution. The default floor is 5 ms, so on Windows it silently becomes ~15 ms. Harmless security-wise (15 ms still dominates the 180–360 µs crypto gap), but the XML doc says "Keep this small (single-digit milliseconds): 5 ms already dominates…" — which is accurate on Linux but misleading on Windows. Suggest one sentence in the ReceiveRejectionFloor doc:
On Windows,
Task.Delaytimer resolution is ~15 ms, so values below that threshold are silently rounded up; this does not affect the security property but affects throughput estimates on rejected traffic.
LOW — Successful_receive_is_not_padded_by_the_floor upper-bound assertion
sw.Elapsed.Should().BeLessThan(TimeSpan.FromSeconds(1)); // 2 s floor, 1 s boundThe margin is large enough that this should never flake, but on a heavily loaded CI runner a full in-process decrypt + DID resolve round-trip could creep toward 1 s. No action needed unless the Windows run shows a miss — just a watch item.
Nits
tasks/todo20260617T200730Z.mdcommitted — intentional per project workflow, fine.- The
#35:prefix is repeated in ~every comment block. The WHY is non-obvious enough to justify comments; a single consolidated note at the top ofNormalizedReceiveRejectionAsyncrather than repeated at each call site would be easier to maintain.
Summary: Core fix is correct, Finding 1 was caught and fixed before merge, Finding 2 is honestly disclosed. Two items worth doing before merge: (1) open a dedicated tracking issue for the network-DID-resolution timing residual, and (2) add one sentence about Windows timer granularity to the ReceiveRejectionFloor doc. Everything else is a nit or watch item.
…ndows timer note - Finding 2 (held-path network-DID-resolution timing residual) now has a dedicated tracking issue (#44); reference it from the ReceiveRejectionFloor doc, the DelayToRejectionFloorAsync LIMITATION note, and the CHANGELOG so an operator searching issues finds it (it was previously only in inline docs). - Document that Task.Delay rounds up to the Windows system timer resolution (~15 ms), so a sub-15 ms floor costs ~15 ms there — harmless to the security property, relevant to rejected-traffic throughput estimates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks — addressed in ada3af3:
Build clean under warnings-as-errors; full suite still green. |
Summary
Closes #35 (High) — the timing side-channel on the HTTP DIDComm receive path that enabled
recipient-kid enumeration (the residual the #20 red-team filed;
tasks/lessons.mdL-025).After #20 unified every HTTP receive failure to a bodyless
400, response time still partitionedthem: an envelope to a recipient key the agent holds runs full ECDH/AEAD (~360 µs) while one to a
key it doesn't hold fast-fails before any crypto (~180 µs). The held path's p10 sat entirely above
the unheld path's p90 — a single timed request reliably separated held from unheld.
Fix
A constant-time floor on the HTTP
400-rejection path:DidCommReceiveOptions.ReceiveRejectionFloor(default 5 ms;TimeSpan.Zerodisables). BothMapDidCommEndpointHTTP overloads pad the400with a non-cancelable, timer-backedTask.Delayup to the floor, so unheld-fast-fail and held-AEAD-fail pad to the same time and are indistinguishable.
Task.Delayis timer-backed (holds no thread), so padding a soon-to-be-rejected request is cheap.202success is never padded (hot path untouched);415/413pre-decrypt rejections are unchanged.Cost is paid only on rejected — typically hostile or malformed — traffic.
is no per-message response to time.
Behavior change: rejected HTTP requests now take ≥ the floor (default 5 ms). Pre-1.0, security fix.
Adversarial pass (repo rule) — two findings
≤
MaxReceiveBytes(1 MiB) body read. A peer could pad the envelope toward the cap so thekid-independent body read alone exhausted the floor, collapse the pad to zero, and re-expose the gap.
Moved the timestamp to after the body read so the floor budgets only the unpack window. Covered by
Large_body_rejection_is_still_padded_to_the_floor. A second agent verified closure and that no neworacle (e.g. via body-read time) was introduced.
decrypts and then does network DID resolution (authcrypt sender / FR-CONSIST-06 /
from_prior)can run longer than any sane floor — e.g. an attacker authcrypts to a guessed kid signed by an
attacker-controlled, deliberately-slow
did:webvhsender, so a held response visibly exceeds thefloor while an unheld one stays at it. Network latency is unbounded; a fixed floor can't close it
without an absurd value. Documented on the public API (
ReceiveRejectionFloor) + CHANGELOG; mitigationis authentication / a rate-limiter in front of the endpoint (the deployment tradeoff Timing side-channel on HTTP receive enables recipient-kid enumeration (residual of #20) #35 names).
Root cause (tracked upstream)
The local-crypto gap originates in
JweParser.Parse/FindPresent(DataProofsDotnet.Jose), whichfast-fails before ECDH when no recipient kid is held. The floor is a compensating edge control; the
durable constant-time-recipient-selection fix is filed upstream as moisesja/dataproofs-dotnet#12
and tracked here in #42.
Tests
Full suite green — 656 tests, 0 failures. +6 new in
AspNetCoreReceiveRoundTripTests:DelayToRejectionFloorAsync): waits out the floor; no-op atTimeSpan.Zero; no-op when work already exceeded the floor.Rejection_response_is_padded_to_the_floor(200 ms floor → ≥ 180 ms, lower-bound, not flaky).Large_body_rejection_is_still_padded_to_the_floor(Finding 1 regression).Successful_receive_is_not_padded_by_the_floor(2 s floor;202returns far faster).Builds clean under
TreatWarningsAsErrors(the conditionalawaitsatisfies CS1998).Files
src/DidComm.AspNetCore/DidCommReceiveOptions.cs—ReceiveRejectionFloor.src/DidComm.AspNetCore/DidCommEndpointRouteBuilderExtensions.cs— async rejection + pad helper, post-body-read clock, WS notes.tests/DidComm.InteropTests/Transports/AspNetCoreReceiveRoundTripTests.cs— new tests + harness floor knob.CHANGELOG.md,tasks/lessons.md(L-030),tasks/todo20260617T200730Z.md.🤖 Generated with Claude Code